Solutions to this workshop can be found here
This class will show you a tiny bit of plotting using the built-in R functions, but will pretty quickly veer into a very popular R package called ggplot2, which is often referred to as just “ggplot”. This is likely the first place in this course where you will see things that are very easy to do in R that would be much more complicated tasks (or maybe even impossible) in excel.
Let’s load in and consider the penguins dataset that we played with when learning about dataframes. We had loaded this in from a .csv file.
# Load necessary packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr) # A package that convert table formats
# read in the penguin data and save it to a variable called penguins
penguins <- read.csv(file = "penguins.csv",
header = TRUE,
row.names = 1)
Let’s use base R to plot the bill length vs bill depth for all the data
plot(penguins$bill_length_mm, penguins$bill_depth_mm)
Pretty straightforward; the command is plot(x, y).
# Try making a plot of penguin flipper length vs body mass
# comparing these two plots, what conclusions would you make? how could these plots be improved?
You can further modify the plot if you want to change the way the points look, etc. As I mentioned, we won’t be going deep into the details of the regular R plot function.
Let’s try to use ggplot to plot the same data. First, install ggplot2 (if you haven’t already done so) and load it into your R session.
# uncomment the line below and install ggplot2 only if you haven't already
#install.packages('ggplot2')
# load the ggplot2 library into the current R session
library(ggplot2)
The same plot as the one we made above is actually a bit more complicated to put together in ggplot:
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
The above contains the components that are the bare minimum of what we need for a ggplot plot; we can add more on later, but let’s dissect the parts of this command:
ggplot(data = <DATA>, mapping = aes(<Mapping>)) +
<GEOM_FUNCTION>()
Arguments like data and mapping can go in the parentheses after the geom, producing the same plot as above:
ggplot() +
geom_point(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm))
## Warning: Removed 2 rows containing missing values (geom_point).
But there are specific situations in which it’s better to do this (we’ll see them later)
We can also pass additional arguments to the geom: useful ones to know are:
The plot we made above isn’t really all that useful. It’s great to see the data across all three species on one plot, but if we’re looking at this data, we’re probably actually interested in how these species differ from each other. So how do we make ggplot visually separate the points by species?
Remember that the mapping argument deals with any properties of the plot that depend on variables in the supplied data frame. So we can modify our original code like this:
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species, fill = species)) +
geom_point(alpha = 0.33, shape = 23, size = 5)
## Warning: Removed 2 rows containing missing values (geom_point).
Notice that the plot above uses both a variable-dependent color (based on the penguin dataframe’s species column), which goes inside aes( ), and variable-independent values (alpha, shape, size) that applies to the whole geom_point command and goes outside aes( )
Also, notice that you got a legend for free! You didn’t have to tell ggplot how to make it, or what info to include in it; it knows automatically based on how you set up your mapping.
Depending on context, you can make color, fill, shape, size or alpha variable-dependent. Some of these (color, fill, shape) obviously make more sense for categorical variables, while others (alpha, size) make more sense for continuous variables, but ggplot will only rarely stop you from making aesthetically and data representationally questionable choices here.
Let’s try an exercise:
# Based on the code above, make a plot where body mass is on the x axis,
# flipper length is on the y axis,
#the fill of the points depends on the island,
# and the shape of the point depends on the species
One of the places where ggplot really shines is when you want to combine multiple data representations on one plot. For example, I really like topology-style contour plots, which ggplot can make with geom_density2d. Once we know how to make a basic plot, and combining a contour plot with a plot the individual data points is super easy in ggplot:
# note, the first two lines are just our plot from above
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_density2d() +
geom_point(alpha = 0.33)
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
## Warning: Removed 2 rows containing missing values (geom_point).
Notice that the alpha argument we provided only applies to geom_point, so the contour lines don’t show any transparency. However, any arguments provided to mapping in an aes( ) statement in the ggplot( ) command apply across all geoms. (Also, notice that when we add a geom, ggplot automatically updates our legend!)
As you’ve seen, ggplot provides users with the power to easily change the appearance of the plot, and the statistics calculated, based on any single column in the dataframe containing the data to be plotted. But this also results in some pretty rigid rules about how your data needs to be organized. Namely, data for ggplot should be in tidy format:
Let’s take a look at what that means. Imagine the penguin data was collected for breeding pairs. Below we have created two subset of the penguins dataframe to mimic this, containing the same data on the same individuals, one male and one female per species.
penguin_pairs_1 <- penguins[c(1,2,5,6,155:158,277:280),
c('species', 'sex', 'bill_length_mm')]
penguin_pairs_1$breeding_pair <- rep(1:6, each = 2)
penguin_pairs_2 <-
pivot_wider(penguin_pairs_1, names_from = 'sex',
values_from = 'bill_length_mm', names_prefix = 'bill_length_mm_')
print(penguin_pairs_1)
## species sex bill_length_mm breeding_pair
## 1 Adelie male 39.1 1
## 2 Adelie female 39.5 1
## 5 Adelie female 36.7 2
## 6 Adelie male 39.3 2
## 155 Gentoo female 48.7 3
## 156 Gentoo male 50.0 3
## 157 Gentoo male 47.6 4
## 158 Gentoo female 46.5 4
## 277 Chinstrap female 46.5 5
## 278 Chinstrap male 50.0 5
## 279 Chinstrap male 51.3 6
## 280 Chinstrap female 45.4 6
print(penguin_pairs_2)
## # A tibble: 6 x 4
## species breeding_pair bill_length_mm_male bill_length_mm_female
## <chr> <int> <dbl> <dbl>
## 1 Adelie 1 39.1 39.5
## 2 Adelie 2 39.3 36.7
## 3 Gentoo 3 50 48.7
## 4 Gentoo 4 47.6 46.5
## 5 Chinstrap 5 50 46.5
## 6 Chinstrap 6 51.3 45.4
Imagine we had two graphs we wanted to make:
For each of these graphs, what are the individual observations (i.e. are they breeding pairs or individual penguins)? Which is the easiest dataset to use for plotting each of these graphs with ggplot?
Try both of them out below.
The tidyr package (which, like ggplot2, is part of the tidyverse package) has some really great functions for re-organizing data, allowing you to convert from something that looks like penguin_pairs_1 into penguin_pairs_2, and vice versa. If you find yourself facing data that isn’t organized the right way for your plot, I really suggest looking over David Gresham’s tidyverse tutorial and the more up-to-date tidyr tutorial on pivot.
Try plotting some of your own data! Here are some commonly used types of plots (and their corresponding geoms) for data visualization:
* scatter plots: geom_point() * density plots: geom_density() * histograms: geom_histogram() * boxplots: geom_boxplot() * barplots: geom_bar() * lineplots: geom_line()
ggplot actually creates objects that we can store as variables and add onto. So, for example, we can do this:
basic_penguin_plot <-
ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point()
print(basic_penguin_plot)
## Warning: Removed 2 rows containing missing values (geom_point).
# let's add another geom to this plot
penguin_plot_with_contours <-
basic_penguin_plot + geom_density2d()
print(penguin_plot_with_contours)
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot also allows a huge amount of control over other aspects of the plot (e.g. titles, axis labeling and scale, overall plot look, etc). For most of these, ggplot actually allows multiple equivalent ways to achieve the same effect.
Adding a title to a plot can be achieved using ggtitle()
basic_penguin_plot +
ggtitle('Penguin Bills')
## Warning: Removed 2 rows containing missing values (geom_point).
We can also modify the axis properties directly
basic_penguin_plot +
ggtitle('Penguin Bills') +
scale_x_continuous(name = 'Bill Length',
limits = c(0,60)) +
scale_y_log10(name = 'Bill Depth',
breaks = c(15, 18, 21))
## Warning: Removed 2 rows containing missing values (geom_point).
There’s a few things going on here:
You can modify the legend in a similar way to the other mappings (e.g. the axes); for example, if we want to modify the way the thing mapped to ‘color’ on our plot is represented, we can use scale_color_discrete( ), or, if we want to manually change the values assigned to each category (e.g. the colors), scale_color_manual( ):
basic_penguin_plot +
scale_color_manual(values=c("violet", "blue", "gray"),
name="Penguin Species",
labels=c("Adelie", "Chinstrap", "Gentoo"))
## Warning: Removed 2 rows containing missing values (geom_point).
We can also change the position of the legend using theme( ) (which can actually control nearly every other aesthetic aspect of the plot, such as font size, which axes get labels/tickmarks, etc).
basic_penguin_plot +
scale_color_manual(values=c("violet", "blue", "gray"),
name="Penguin Species",
labels=c("Adelie", "Chinstrap", "Gentoo")) +
theme(legend.position = 'bottom')
## Warning: Removed 2 rows containing missing values (geom_point).
Finally, the overall appearance of the graph can be changed by selecting a custom ‘theme’; this is a bit confusing, since these are distinct from the theme( ) command used above.
basic_penguin_plot +
scale_color_manual(values=c("violet", "blue", "gray"),
name="Penguin Species",
labels=c("Adelie", "Chinstrap", "Gentoo")) +
theme(legend.position = 'bottom') +
theme_bw()
## Warning: Removed 2 rows containing missing values (geom_point).
One really powerful application of this is that we can actually make each geom( ) represent a different aspect of the same data. Let’s say we’d like our datapoints to be colored by species, but we’d also like to see a contour plot of bill length vs depth across all the species. To do this, we’re going to have to move our mapping calls inside the geoms, since we now want each geom to map the data differently:
# Removed alpha for simplicity
# Made contour plot line color black (default is blue)
ggplot(data = penguins) +
geom_density2d(mapping = aes(x = bill_length_mm, y = bill_depth_mm), color = 'black') +
geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species))
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
## Warning: Removed 2 rows containing missing values (geom_point).
# can also be written as
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_density2d(color = 'black') +
geom_point(mapping = aes(color = species))
This plot shows that mapping actually controls not just where to plot the data points and how they should look aesthetically, but also how the data is grouped when it’s represented in the plot. Notice that in the first contour plot, the statistics needed to plot the contours were computed separately for each species. However, when we removed species from the aes( ) being used by geom_density2d, the data was no longer separated by species for any of the stats calculated for this geom, and they’re instead calculated across all the points in the dataset.
Let’s try an exercise. A really useful kind of plot you can make while exploring data is a density plot, which shows pretty much a normalized, smoothed histogram of your data using geom_density. For example, if we want to get an idea of what the distribution of bill lengths in our dataset is, we can run:
# Density plot to see the distribution of bill lengths in our data
ggplot(data = penguins) +
geom_density(mapping = aes(x = bill_length_mm))
## Warning: Removed 2 rows containing non-finite values (stat_density).
Now repeat this plot, but overlaying the density plot for each species on this plot that shows the distribution across all 3 species’ data:
# Make a density plot that shows both the distribution of bill_length_mm in all
# the data together in one color, and the distribution for each species'
# bill_lengh_mm each in its own color
# Bonus: Change the linetype of the species' density plots so that each species
# has the same dashed line, but the line representing results across all the
# data is solid
ggplot makes it super easy to combine multiple datasets on one plot, assuming they have the relevant variables (dataframe columns) in common. Let’s break up the penguins dataframe to see how this works:
penguins_nonchinstrap <- subset(penguins, species != 'Chinstrap')
penguin_chinstrap_mass <- subset(penguins, species == 'Chinstrap')[, c('body_mass_g', 'species')]
print(penguins_nonchinstrap)
## species island bill_length_mm bill_depth_mm flipper_length_mm
## 1 Adelie Torgersen 39.1 18.7 181
## 2 Adelie Torgersen 39.5 17.4 186
## 3 Adelie Torgersen 40.3 18.0 195
## 4 Adelie Torgersen NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193
## 6 Adelie Torgersen 39.3 20.6 190
## 7 Adelie Torgersen 38.9 17.8 181
## 8 Adelie Torgersen 39.2 19.6 195
## 9 Adelie Torgersen 34.1 18.1 193
## 10 Adelie Torgersen 42.0 20.2 190
## 11 Adelie Torgersen 37.8 17.1 186
## 12 Adelie Torgersen 37.8 17.3 180
## 13 Adelie Torgersen 41.1 17.6 182
## 14 Adelie Torgersen 38.6 21.2 191
## 15 Adelie Torgersen 34.6 21.1 198
## 16 Adelie Torgersen 36.6 17.8 185
## 17 Adelie Torgersen 38.7 19.0 195
## 18 Adelie Torgersen 42.5 20.7 197
## 19 Adelie Torgersen 34.4 18.4 184
## 20 Adelie Torgersen 46.0 21.5 194
## 21 Adelie Biscoe 37.8 18.3 174
## 22 Adelie Biscoe 37.7 18.7 180
## 23 Adelie Biscoe 35.9 19.2 189
## 24 Adelie Biscoe 38.2 18.1 185
## 25 Adelie Biscoe 38.8 17.2 180
## 26 Adelie Biscoe 35.3 18.9 187
## 27 Adelie Biscoe 40.6 18.6 183
## 28 Adelie Biscoe 40.5 17.9 187
## 29 Adelie Biscoe 37.9 18.6 172
## 30 Adelie Biscoe 40.5 18.9 180
## 31 Adelie Dream 39.5 16.7 178
## 32 Adelie Dream 37.2 18.1 178
## 33 Adelie Dream 39.5 17.8 188
## 34 Adelie Dream 40.9 18.9 184
## 35 Adelie Dream 36.4 17.0 195
## 36 Adelie Dream 39.2 21.1 196
## 37 Adelie Dream 38.8 20.0 190
## 38 Adelie Dream 42.2 18.5 180
## 39 Adelie Dream 37.6 19.3 181
## 40 Adelie Dream 39.8 19.1 184
## 41 Adelie Dream 36.5 18.0 182
## 42 Adelie Dream 40.8 18.4 195
## 43 Adelie Dream 36.0 18.5 186
## 44 Adelie Dream 44.1 19.7 196
## 45 Adelie Dream 37.0 16.9 185
## 46 Adelie Dream 39.6 18.8 190
## 47 Adelie Dream 41.1 19.0 182
## 48 Adelie Dream 37.5 18.9 179
## 49 Adelie Dream 36.0 17.9 190
## 50 Adelie Dream 42.3 21.2 191
## 51 Adelie Biscoe 39.6 17.7 186
## 52 Adelie Biscoe 40.1 18.9 188
## 53 Adelie Biscoe 35.0 17.9 190
## 54 Adelie Biscoe 42.0 19.5 200
## 55 Adelie Biscoe 34.5 18.1 187
## 56 Adelie Biscoe 41.4 18.6 191
## 57 Adelie Biscoe 39.0 17.5 186
## 58 Adelie Biscoe 40.6 18.8 193
## 59 Adelie Biscoe 36.5 16.6 181
## 60 Adelie Biscoe 37.6 19.1 194
## 61 Adelie Biscoe 35.7 16.9 185
## 62 Adelie Biscoe 41.3 21.1 195
## 63 Adelie Biscoe 37.6 17.0 185
## 64 Adelie Biscoe 41.1 18.2 192
## 65 Adelie Biscoe 36.4 17.1 184
## 66 Adelie Biscoe 41.6 18.0 192
## 67 Adelie Biscoe 35.5 16.2 195
## 68 Adelie Biscoe 41.1 19.1 188
## 69 Adelie Torgersen 35.9 16.6 190
## 70 Adelie Torgersen 41.8 19.4 198
## 71 Adelie Torgersen 33.5 19.0 190
## 72 Adelie Torgersen 39.7 18.4 190
## 73 Adelie Torgersen 39.6 17.2 196
## 74 Adelie Torgersen 45.8 18.9 197
## 75 Adelie Torgersen 35.5 17.5 190
## 76 Adelie Torgersen 42.8 18.5 195
## 77 Adelie Torgersen 40.9 16.8 191
## 78 Adelie Torgersen 37.2 19.4 184
## 79 Adelie Torgersen 36.2 16.1 187
## 80 Adelie Torgersen 42.1 19.1 195
## 81 Adelie Torgersen 34.6 17.2 189
## 82 Adelie Torgersen 42.9 17.6 196
## 83 Adelie Torgersen 36.7 18.8 187
## 84 Adelie Torgersen 35.1 19.4 193
## 85 Adelie Dream 37.3 17.8 191
## 86 Adelie Dream 41.3 20.3 194
## 87 Adelie Dream 36.3 19.5 190
## 88 Adelie Dream 36.9 18.6 189
## 89 Adelie Dream 38.3 19.2 189
## 90 Adelie Dream 38.9 18.8 190
## 91 Adelie Dream 35.7 18.0 202
## 92 Adelie Dream 41.1 18.1 205
## 93 Adelie Dream 34.0 17.1 185
## 94 Adelie Dream 39.6 18.1 186
## 95 Adelie Dream 36.2 17.3 187
## 96 Adelie Dream 40.8 18.9 208
## 97 Adelie Dream 38.1 18.6 190
## 98 Adelie Dream 40.3 18.5 196
## 99 Adelie Dream 33.1 16.1 178
## 100 Adelie Dream 43.2 18.5 192
## 101 Adelie Biscoe 35.0 17.9 192
## 102 Adelie Biscoe 41.0 20.0 203
## 103 Adelie Biscoe 37.7 16.0 183
## 104 Adelie Biscoe 37.8 20.0 190
## 105 Adelie Biscoe 37.9 18.6 193
## 106 Adelie Biscoe 39.7 18.9 184
## 107 Adelie Biscoe 38.6 17.2 199
## 108 Adelie Biscoe 38.2 20.0 190
## 109 Adelie Biscoe 38.1 17.0 181
## 110 Adelie Biscoe 43.2 19.0 197
## 111 Adelie Biscoe 38.1 16.5 198
## 112 Adelie Biscoe 45.6 20.3 191
## 113 Adelie Biscoe 39.7 17.7 193
## 114 Adelie Biscoe 42.2 19.5 197
## 115 Adelie Biscoe 39.6 20.7 191
## 116 Adelie Biscoe 42.7 18.3 196
## 117 Adelie Torgersen 38.6 17.0 188
## 118 Adelie Torgersen 37.3 20.5 199
## 119 Adelie Torgersen 35.7 17.0 189
## 120 Adelie Torgersen 41.1 18.6 189
## 121 Adelie Torgersen 36.2 17.2 187
## 122 Adelie Torgersen 37.7 19.8 198
## 123 Adelie Torgersen 40.2 17.0 176
## 124 Adelie Torgersen 41.4 18.5 202
## 125 Adelie Torgersen 35.2 15.9 186
## 126 Adelie Torgersen 40.6 19.0 199
## 127 Adelie Torgersen 38.8 17.6 191
## 128 Adelie Torgersen 41.5 18.3 195
## 129 Adelie Torgersen 39.0 17.1 191
## 130 Adelie Torgersen 44.1 18.0 210
## 131 Adelie Torgersen 38.5 17.9 190
## 132 Adelie Torgersen 43.1 19.2 197
## 133 Adelie Dream 36.8 18.5 193
## 134 Adelie Dream 37.5 18.5 199
## 135 Adelie Dream 38.1 17.6 187
## 136 Adelie Dream 41.1 17.5 190
## 137 Adelie Dream 35.6 17.5 191
## 138 Adelie Dream 40.2 20.1 200
## 139 Adelie Dream 37.0 16.5 185
## 140 Adelie Dream 39.7 17.9 193
## 141 Adelie Dream 40.2 17.1 193
## 142 Adelie Dream 40.6 17.2 187
## 143 Adelie Dream 32.1 15.5 188
## 144 Adelie Dream 40.7 17.0 190
## 145 Adelie Dream 37.3 16.8 192
## 146 Adelie Dream 39.0 18.7 185
## 147 Adelie Dream 39.2 18.6 190
## 148 Adelie Dream 36.6 18.4 184
## 149 Adelie Dream 36.0 17.8 195
## 150 Adelie Dream 37.8 18.1 193
## 151 Adelie Dream 36.0 17.1 187
## 152 Adelie Dream 41.5 18.5 201
## 153 Gentoo Biscoe 46.1 13.2 211
## 154 Gentoo Biscoe 50.0 16.3 230
## 155 Gentoo Biscoe 48.7 14.1 210
## 156 Gentoo Biscoe 50.0 15.2 218
## 157 Gentoo Biscoe 47.6 14.5 215
## 158 Gentoo Biscoe 46.5 13.5 210
## 159 Gentoo Biscoe 45.4 14.6 211
## 160 Gentoo Biscoe 46.7 15.3 219
## 161 Gentoo Biscoe 43.3 13.4 209
## 162 Gentoo Biscoe 46.8 15.4 215
## 163 Gentoo Biscoe 40.9 13.7 214
## 164 Gentoo Biscoe 49.0 16.1 216
## 165 Gentoo Biscoe 45.5 13.7 214
## 166 Gentoo Biscoe 48.4 14.6 213
## 167 Gentoo Biscoe 45.8 14.6 210
## 168 Gentoo Biscoe 49.3 15.7 217
## 169 Gentoo Biscoe 42.0 13.5 210
## 170 Gentoo Biscoe 49.2 15.2 221
## 171 Gentoo Biscoe 46.2 14.5 209
## 172 Gentoo Biscoe 48.7 15.1 222
## 173 Gentoo Biscoe 50.2 14.3 218
## 174 Gentoo Biscoe 45.1 14.5 215
## 175 Gentoo Biscoe 46.5 14.5 213
## 176 Gentoo Biscoe 46.3 15.8 215
## 177 Gentoo Biscoe 42.9 13.1 215
## 178 Gentoo Biscoe 46.1 15.1 215
## 179 Gentoo Biscoe 44.5 14.3 216
## 180 Gentoo Biscoe 47.8 15.0 215
## 181 Gentoo Biscoe 48.2 14.3 210
## 182 Gentoo Biscoe 50.0 15.3 220
## 183 Gentoo Biscoe 47.3 15.3 222
## 184 Gentoo Biscoe 42.8 14.2 209
## 185 Gentoo Biscoe 45.1 14.5 207
## 186 Gentoo Biscoe 59.6 17.0 230
## 187 Gentoo Biscoe 49.1 14.8 220
## 188 Gentoo Biscoe 48.4 16.3 220
## 189 Gentoo Biscoe 42.6 13.7 213
## 190 Gentoo Biscoe 44.4 17.3 219
## 191 Gentoo Biscoe 44.0 13.6 208
## 192 Gentoo Biscoe 48.7 15.7 208
## 193 Gentoo Biscoe 42.7 13.7 208
## 194 Gentoo Biscoe 49.6 16.0 225
## 195 Gentoo Biscoe 45.3 13.7 210
## 196 Gentoo Biscoe 49.6 15.0 216
## 197 Gentoo Biscoe 50.5 15.9 222
## 198 Gentoo Biscoe 43.6 13.9 217
## 199 Gentoo Biscoe 45.5 13.9 210
## 200 Gentoo Biscoe 50.5 15.9 225
## 201 Gentoo Biscoe 44.9 13.3 213
## 202 Gentoo Biscoe 45.2 15.8 215
## 203 Gentoo Biscoe 46.6 14.2 210
## 204 Gentoo Biscoe 48.5 14.1 220
## 205 Gentoo Biscoe 45.1 14.4 210
## 206 Gentoo Biscoe 50.1 15.0 225
## 207 Gentoo Biscoe 46.5 14.4 217
## 208 Gentoo Biscoe 45.0 15.4 220
## 209 Gentoo Biscoe 43.8 13.9 208
## 210 Gentoo Biscoe 45.5 15.0 220
## 211 Gentoo Biscoe 43.2 14.5 208
## 212 Gentoo Biscoe 50.4 15.3 224
## 213 Gentoo Biscoe 45.3 13.8 208
## 214 Gentoo Biscoe 46.2 14.9 221
## 215 Gentoo Biscoe 45.7 13.9 214
## 216 Gentoo Biscoe 54.3 15.7 231
## 217 Gentoo Biscoe 45.8 14.2 219
## 218 Gentoo Biscoe 49.8 16.8 230
## 219 Gentoo Biscoe 46.2 14.4 214
## 220 Gentoo Biscoe 49.5 16.2 229
## 221 Gentoo Biscoe 43.5 14.2 220
## 222 Gentoo Biscoe 50.7 15.0 223
## 223 Gentoo Biscoe 47.7 15.0 216
## 224 Gentoo Biscoe 46.4 15.6 221
## 225 Gentoo Biscoe 48.2 15.6 221
## 226 Gentoo Biscoe 46.5 14.8 217
## 227 Gentoo Biscoe 46.4 15.0 216
## 228 Gentoo Biscoe 48.6 16.0 230
## 229 Gentoo Biscoe 47.5 14.2 209
## 230 Gentoo Biscoe 51.1 16.3 220
## 231 Gentoo Biscoe 45.2 13.8 215
## 232 Gentoo Biscoe 45.2 16.4 223
## 233 Gentoo Biscoe 49.1 14.5 212
## 234 Gentoo Biscoe 52.5 15.6 221
## 235 Gentoo Biscoe 47.4 14.6 212
## 236 Gentoo Biscoe 50.0 15.9 224
## 237 Gentoo Biscoe 44.9 13.8 212
## 238 Gentoo Biscoe 50.8 17.3 228
## 239 Gentoo Biscoe 43.4 14.4 218
## 240 Gentoo Biscoe 51.3 14.2 218
## 241 Gentoo Biscoe 47.5 14.0 212
## 242 Gentoo Biscoe 52.1 17.0 230
## 243 Gentoo Biscoe 47.5 15.0 218
## 244 Gentoo Biscoe 52.2 17.1 228
## 245 Gentoo Biscoe 45.5 14.5 212
## 246 Gentoo Biscoe 49.5 16.1 224
## 247 Gentoo Biscoe 44.5 14.7 214
## 248 Gentoo Biscoe 50.8 15.7 226
## 249 Gentoo Biscoe 49.4 15.8 216
## 250 Gentoo Biscoe 46.9 14.6 222
## 251 Gentoo Biscoe 48.4 14.4 203
## 252 Gentoo Biscoe 51.1 16.5 225
## 253 Gentoo Biscoe 48.5 15.0 219
## 254 Gentoo Biscoe 55.9 17.0 228
## 255 Gentoo Biscoe 47.2 15.5 215
## 256 Gentoo Biscoe 49.1 15.0 228
## 257 Gentoo Biscoe 47.3 13.8 216
## 258 Gentoo Biscoe 46.8 16.1 215
## 259 Gentoo Biscoe 41.7 14.7 210
## 260 Gentoo Biscoe 53.4 15.8 219
## 261 Gentoo Biscoe 43.3 14.0 208
## 262 Gentoo Biscoe 48.1 15.1 209
## 263 Gentoo Biscoe 50.5 15.2 216
## 264 Gentoo Biscoe 49.8 15.9 229
## 265 Gentoo Biscoe 43.5 15.2 213
## 266 Gentoo Biscoe 51.5 16.3 230
## 267 Gentoo Biscoe 46.2 14.1 217
## 268 Gentoo Biscoe 55.1 16.0 230
## 269 Gentoo Biscoe 44.5 15.7 217
## 270 Gentoo Biscoe 48.8 16.2 222
## 271 Gentoo Biscoe 47.2 13.7 214
## 272 Gentoo Biscoe NA NA NA
## 273 Gentoo Biscoe 46.8 14.3 215
## 274 Gentoo Biscoe 50.4 15.7 222
## 275 Gentoo Biscoe 45.2 14.8 212
## 276 Gentoo Biscoe 49.9 16.1 213
## body_mass_g sex year
## 1 3750 male 2007
## 2 3800 female 2007
## 3 3250 female 2007
## 4 NA <NA> 2007
## 5 3450 female 2007
## 6 3650 male 2007
## 7 3625 female 2007
## 8 4675 male 2007
## 9 3475 <NA> 2007
## 10 4250 <NA> 2007
## 11 3300 <NA> 2007
## 12 3700 <NA> 2007
## 13 3200 female 2007
## 14 3800 male 2007
## 15 4400 male 2007
## 16 3700 female 2007
## 17 3450 female 2007
## 18 4500 male 2007
## 19 3325 female 2007
## 20 4200 male 2007
## 21 3400 female 2007
## 22 3600 male 2007
## 23 3800 female 2007
## 24 3950 male 2007
## 25 3800 male 2007
## 26 3800 female 2007
## 27 3550 male 2007
## 28 3200 female 2007
## 29 3150 female 2007
## 30 3950 male 2007
## 31 3250 female 2007
## 32 3900 male 2007
## 33 3300 female 2007
## 34 3900 male 2007
## 35 3325 female 2007
## 36 4150 male 2007
## 37 3950 male 2007
## 38 3550 female 2007
## 39 3300 female 2007
## 40 4650 male 2007
## 41 3150 female 2007
## 42 3900 male 2007
## 43 3100 female 2007
## 44 4400 male 2007
## 45 3000 female 2007
## 46 4600 male 2007
## 47 3425 male 2007
## 48 2975 <NA> 2007
## 49 3450 female 2007
## 50 4150 male 2007
## 51 3500 female 2008
## 52 4300 male 2008
## 53 3450 female 2008
## 54 4050 male 2008
## 55 2900 female 2008
## 56 3700 male 2008
## 57 3550 female 2008
## 58 3800 male 2008
## 59 2850 female 2008
## 60 3750 male 2008
## 61 3150 female 2008
## 62 4400 male 2008
## 63 3600 female 2008
## 64 4050 male 2008
## 65 2850 female 2008
## 66 3950 male 2008
## 67 3350 female 2008
## 68 4100 male 2008
## 69 3050 female 2008
## 70 4450 male 2008
## 71 3600 female 2008
## 72 3900 male 2008
## 73 3550 female 2008
## 74 4150 male 2008
## 75 3700 female 2008
## 76 4250 male 2008
## 77 3700 female 2008
## 78 3900 male 2008
## 79 3550 female 2008
## 80 4000 male 2008
## 81 3200 female 2008
## 82 4700 male 2008
## 83 3800 female 2008
## 84 4200 male 2008
## 85 3350 female 2008
## 86 3550 male 2008
## 87 3800 male 2008
## 88 3500 female 2008
## 89 3950 male 2008
## 90 3600 female 2008
## 91 3550 female 2008
## 92 4300 male 2008
## 93 3400 female 2008
## 94 4450 male 2008
## 95 3300 female 2008
## 96 4300 male 2008
## 97 3700 female 2008
## 98 4350 male 2008
## 99 2900 female 2008
## 100 4100 male 2008
## 101 3725 female 2009
## 102 4725 male 2009
## 103 3075 female 2009
## 104 4250 male 2009
## 105 2925 female 2009
## 106 3550 male 2009
## 107 3750 female 2009
## 108 3900 male 2009
## 109 3175 female 2009
## 110 4775 male 2009
## 111 3825 female 2009
## 112 4600 male 2009
## 113 3200 female 2009
## 114 4275 male 2009
## 115 3900 female 2009
## 116 4075 male 2009
## 117 2900 female 2009
## 118 3775 male 2009
## 119 3350 female 2009
## 120 3325 male 2009
## 121 3150 female 2009
## 122 3500 male 2009
## 123 3450 female 2009
## 124 3875 male 2009
## 125 3050 female 2009
## 126 4000 male 2009
## 127 3275 female 2009
## 128 4300 male 2009
## 129 3050 female 2009
## 130 4000 male 2009
## 131 3325 female 2009
## 132 3500 male 2009
## 133 3500 female 2009
## 134 4475 male 2009
## 135 3425 female 2009
## 136 3900 male 2009
## 137 3175 female 2009
## 138 3975 male 2009
## 139 3400 female 2009
## 140 4250 male 2009
## 141 3400 female 2009
## 142 3475 male 2009
## 143 3050 female 2009
## 144 3725 male 2009
## 145 3000 female 2009
## 146 3650 male 2009
## 147 4250 male 2009
## 148 3475 female 2009
## 149 3450 female 2009
## 150 3750 male 2009
## 151 3700 female 2009
## 152 4000 male 2009
## 153 4500 female 2007
## 154 5700 male 2007
## 155 4450 female 2007
## 156 5700 male 2007
## 157 5400 male 2007
## 158 4550 female 2007
## 159 4800 female 2007
## 160 5200 male 2007
## 161 4400 female 2007
## 162 5150 male 2007
## 163 4650 female 2007
## 164 5550 male 2007
## 165 4650 female 2007
## 166 5850 male 2007
## 167 4200 female 2007
## 168 5850 male 2007
## 169 4150 female 2007
## 170 6300 male 2007
## 171 4800 female 2007
## 172 5350 male 2007
## 173 5700 male 2007
## 174 5000 female 2007
## 175 4400 female 2007
## 176 5050 male 2007
## 177 5000 female 2007
## 178 5100 male 2007
## 179 4100 <NA> 2007
## 180 5650 male 2007
## 181 4600 female 2007
## 182 5550 male 2007
## 183 5250 male 2007
## 184 4700 female 2007
## 185 5050 female 2007
## 186 6050 male 2007
## 187 5150 female 2008
## 188 5400 male 2008
## 189 4950 female 2008
## 190 5250 male 2008
## 191 4350 female 2008
## 192 5350 male 2008
## 193 3950 female 2008
## 194 5700 male 2008
## 195 4300 female 2008
## 196 4750 male 2008
## 197 5550 male 2008
## 198 4900 female 2008
## 199 4200 female 2008
## 200 5400 male 2008
## 201 5100 female 2008
## 202 5300 male 2008
## 203 4850 female 2008
## 204 5300 male 2008
## 205 4400 female 2008
## 206 5000 male 2008
## 207 4900 female 2008
## 208 5050 male 2008
## 209 4300 female 2008
## 210 5000 male 2008
## 211 4450 female 2008
## 212 5550 male 2008
## 213 4200 female 2008
## 214 5300 male 2008
## 215 4400 female 2008
## 216 5650 male 2008
## 217 4700 female 2008
## 218 5700 male 2008
## 219 4650 <NA> 2008
## 220 5800 male 2008
## 221 4700 female 2008
## 222 5550 male 2008
## 223 4750 female 2008
## 224 5000 male 2008
## 225 5100 male 2008
## 226 5200 female 2008
## 227 4700 female 2008
## 228 5800 male 2008
## 229 4600 female 2008
## 230 6000 male 2008
## 231 4750 female 2008
## 232 5950 male 2008
## 233 4625 female 2009
## 234 5450 male 2009
## 235 4725 female 2009
## 236 5350 male 2009
## 237 4750 female 2009
## 238 5600 male 2009
## 239 4600 female 2009
## 240 5300 male 2009
## 241 4875 female 2009
## 242 5550 male 2009
## 243 4950 female 2009
## 244 5400 male 2009
## 245 4750 female 2009
## 246 5650 male 2009
## 247 4850 female 2009
## 248 5200 male 2009
## 249 4925 male 2009
## 250 4875 female 2009
## 251 4625 female 2009
## 252 5250 male 2009
## 253 4850 female 2009
## 254 5600 male 2009
## 255 4975 female 2009
## 256 5500 male 2009
## 257 4725 <NA> 2009
## 258 5500 male 2009
## 259 4700 female 2009
## 260 5500 male 2009
## 261 4575 female 2009
## 262 5500 male 2009
## 263 5000 female 2009
## 264 5950 male 2009
## 265 4650 female 2009
## 266 5500 male 2009
## 267 4375 female 2009
## 268 5850 male 2009
## 269 4875 <NA> 2009
## 270 6000 male 2009
## 271 4925 female 2009
## 272 NA <NA> 2009
## 273 4850 female 2009
## 274 5750 male 2009
## 275 5200 female 2009
## 276 5400 male 2009
print(penguin_chinstrap_mass)
## body_mass_g species
## 277 3500 Chinstrap
## 278 3900 Chinstrap
## 279 3650 Chinstrap
## 280 3525 Chinstrap
## 281 3725 Chinstrap
## 282 3950 Chinstrap
## 283 3250 Chinstrap
## 284 3750 Chinstrap
## 285 4150 Chinstrap
## 286 3700 Chinstrap
## 287 3800 Chinstrap
## 288 3775 Chinstrap
## 289 3700 Chinstrap
## 290 4050 Chinstrap
## 291 3575 Chinstrap
## 292 4050 Chinstrap
## 293 3300 Chinstrap
## 294 3700 Chinstrap
## 295 3450 Chinstrap
## 296 4400 Chinstrap
## 297 3600 Chinstrap
## 298 3400 Chinstrap
## 299 2900 Chinstrap
## 300 3800 Chinstrap
## 301 3300 Chinstrap
## 302 4150 Chinstrap
## 303 3400 Chinstrap
## 304 3800 Chinstrap
## 305 3700 Chinstrap
## 306 4550 Chinstrap
## 307 3200 Chinstrap
## 308 4300 Chinstrap
## 309 3350 Chinstrap
## 310 4100 Chinstrap
## 311 3600 Chinstrap
## 312 3900 Chinstrap
## 313 3850 Chinstrap
## 314 4800 Chinstrap
## 315 2700 Chinstrap
## 316 4500 Chinstrap
## 317 3950 Chinstrap
## 318 3650 Chinstrap
## 319 3550 Chinstrap
## 320 3500 Chinstrap
## 321 3675 Chinstrap
## 322 4450 Chinstrap
## 323 3400 Chinstrap
## 324 4300 Chinstrap
## 325 3250 Chinstrap
## 326 3675 Chinstrap
## 327 3325 Chinstrap
## 328 3950 Chinstrap
## 329 3600 Chinstrap
## 330 4050 Chinstrap
## 331 3350 Chinstrap
## 332 3450 Chinstrap
## 333 3250 Chinstrap
## 334 4050 Chinstrap
## 335 3800 Chinstrap
## 336 3525 Chinstrap
## 337 3950 Chinstrap
## 338 3650 Chinstrap
## 339 3650 Chinstrap
## 340 4000 Chinstrap
## 341 3400 Chinstrap
## 342 3775 Chinstrap
## 343 4100 Chinstrap
## 344 3775 Chinstrap
We now have two dataframes, containing data on different species, and with only a subset of the data in one that is contained in the other (the petal widths and species). But if petal width and species is what we want to plot, this isn’t a problem for ggplot:
ggplot() +
geom_boxplot(data = penguins_nonchinstrap, aes(x = species, y = body_mass_g, color = species)) +
geom_boxplot(data = penguin_chinstrap_mass, aes(x = species, y = body_mass_g, color = species))
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
Another great tool ggplot provides is faceting. This allows you to separate data into subplots based on a column (or multiple columns):
basic_penguin_plot +
facet_wrap( ~ species)
## Warning: Removed 2 rows containing missing values (geom_point).
Notice that the x-axes are consistent among these plots.
Because ggplot is so popular, there’s been a ton of additional packages written that build on top of it. Here are two examples.
Add animations to plots
#install.packages('gifski')
#install.packages('gganimate')
library(gganimate)
# animated_pegnuin_plot <-
# basic_penguin_plot + transition_states(species)
# save plot with 3 frames, play at 1 frame per second
# anim_save("animated_penguin_plot.gif", animated_penguin_plot, nframes = 4, fps = 1)
# render plot using  outside of code chunk